Ratio Rule Mining from Multiple Data Sources

نویسندگان

  • Jun Yan
  • Ning Liu
چکیده

Both multiple source data mining and streaming data mining problems have attracted much attention in the past decade. In contrast to traditional association-rule mining, to capture the quantitative association knowledge, a new paradigm called Ratio Rule (RR) was proposed recently. We extend this framework to mining ratio rules from multiple source data streams which is a novel and challenging problem. The traditional techniques used for ratio rule mining is an eigen-system analysis which can often fall victim to noises. The multiple data sources impose additional constraints for the mining procedure to be robust in the presence of noise, because it is difficult to clean all the data sources in real time in real-world tasks. In addition, the traditional batch methods for ratio rules cannot cope with data streams. In this paper, we propose an integrated method to mining ratio rules from data streams from multiple data sources, by first mining the ratio rules from each data source respectively through a novel robust and adaptive one-pass algorithm (which is called Robust and Adaptive Ratio Rule (RARR)), and then integrating the rules of each data source in a simple probabilistic model with a rule-clustering procedure. In this way, we can acquire the global rules from all the local information sources incrementally. We show that the RARR can converge to a fixed point and is robust as well. Moreover, the integration of rules is efficient and effective. Both theoretical analysis and experiments illustrate that the performance of RARR and the proposed information integration procedure is satisfactory for the purpose of discovering latent associations in multiple-source data streams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining

Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...

متن کامل

Synthesizing Global Negative Association Rules in Multi-Database Mining

Association rule mining has been widely adopted by data mining community for discovering relationship among item-sets that co-occur together frequently. Besides positive association rules, negative association rule mining, which find out negation relationships of frequent item-sets are also important. The importance of negative association rule mining is accounted in customer-driven domains suc...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

Combined Mining Approach to Generate Patterns for Complex Data

In Data mining applications, which often involve complex data like multiple heterogeneous data sources, user preferences, decision-making actions and business impacts etc., the complete useful information cannot be obtained by using single data mining method in the form of informative patterns as that would consume more time and space, if and only if it is possible to join large relevant data s...

متن کامل

Pattern Generation for Complex Data Using Hybrid Mining

Combined mining is a hybrid mining approach for mining informative patterns from single or multiple data-sources, multiple-features extraction and applying multiple-methods as per the requirements. Data mining applications often involve complex data like multiple heterogeneous data sources, different user preference and create decision-making actions. The complete useful information may not be ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006